docs: update site after scraper extraction to standalone repo#31
Merged
nitaibezerra merged 5 commits intomainfrom Feb 25, 2026
Merged
docs: update site after scraper extraction to standalone repo#31nitaibezerra merged 5 commits intomainfrom
nitaibezerra merged 5 commits intomainfrom
Conversation
- Adiciona docs/arquitetura/postgresql.md com schema detalhado - Adiciona docs/modulos/data-platform.md documentando novo repo - Adiciona docs/workflows/airflow-dags.md com DAGs do Composer - Atualiza diagramas em visao-geral.md e fluxo-de-dados.md - Atualiza index.md com nova arquitetura e repositórios - Atualiza componentes-estruturantes.md (HF agora é distribuição) - Atualiza mkdocs.yml com novos arquivos na navegação Mudança principal: PostgreSQL (Cloud SQL) é agora a fonte de verdade, HuggingFace passa a ser camada de distribuição de dados abertos. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Simplifica scraper.md com redirect para data-platform - Atualiza typesense-local.md para usar PostgreSQL como fonte - Atualiza cogfy-integracao.md com fluxo PostgreSQL - Atualiza scraper-pipeline.md com 7 jobs (inclui embeddings) - Atualiza typesense-data.md para sincronizar do PostgreSQL - Atualiza arquitetura-gcp.md com Cloud SQL, Composer, Embeddings API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Atualiza setup-backend.md para usar data-platform e PostgreSQL - Atualiza roteiro-onboarding.md com novos repositórios - Atualiza setup-datascience.md com diagrama PostgreSQL - Remove referências aos repos arquivados (scraper, typesense) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rewrite scraper module page for the new standalone repo with API + DAGs. Remove scraping from data-platform module (CLI, StorageAdapter, env vars). Rewrite scraper pipeline as two-stage architecture (Airflow + GH Actions). Update docker builds, airflow DAGs, onboarding, and data flow diagrams.
…xtraction # Conflicts: # docs/arquitetura/fluxo-de-dados.md # docs/modulos/data-platform.md # docs/modulos/scraper.md # docs/onboarding/setup-backend.md # docs/workflows/airflow-dags.md # docs/workflows/scraper-pipeline.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
modulos/scraper.md: standalone repo with API endpoints, DAGs, deploy infomodulos/data-platform.md: remove scraping, CLI scrape commands, StorageAdapter, env varsworkflows/scraper-pipeline.md: two-stage architecture (Airflow scraping + GH Actions enrichment)workflows/docker-builds.md: scraper uses Artifact Registry + Cloud Run (not GHCR)workflows/airflow-dags.md: add scraper DAGs, bucket subdirectories, remove pandasonboarding/setup-backend.md: remove scrape CLI commands, point to scraper repoarquitetura/fluxo-de-dados.md: diagrams and steps reflect Airflow-based scrapingContext
The scraper was extracted from
data-platformtodestaquesgovbr/scraper. All 7 doc pages had stale references to the old architecture (CLI scraping, GHCR images, single-repo pipeline).Test plan
mkdocs buildcompiles without errorsdata-platform scrapeorghcr.io/...scraperremain🤖 Generated with Claude Code